Using the Distributional Hypothesis to Derive Cooccurrence Scores from the British National Corpus

نویسنده

David Hardcastle

چکیده

In this paper I examine a number of cooccurrence-based scoring systems using the British National corpus to measure word association over wide contexts. The principal aim of this paper is to address the question of how to evaluate a given scoring system, or how to compare two scoring systems, without relying on a small list of example pairs and a ‘feel’ for the results. I evaluate these systems using i) a list of noun-noun pairs and ii) a simple test on aligned and misaligned sets of nouns. I also consider why noun-noun pairs are deemed appropriate for such mechanisms and explore the prospects for determining which words or lemmas will be appropriate for a distributional scoring approach. For my specific application an algorithm similar to MI-score operating across multiple windows offers the best results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing the Distributional Hypothesis 1 Running head: TESTING THE DISTRIBUTIONAL HYPOTHESIS Testing the Distributional Hypothesis: The Influence of Context on Judgments of Semantic Similarity

Distributional information has recently been implicated as playing an important role in several aspects of language ability. Learning the meaning of a word is thought to be dependent, at least in part, on exposure to the word in its linguistic contexts of use. In two experiments, we manipulated subjects’ contextual experience with marginally familiar and nonce words. Results showed that similar...

متن کامل

A Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts

This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...

متن کامل

Identifying semantic relations in a specialized corpus through distributional analysis of a cooccurrence tensor

We describe a method of encoding cooccurrence information in a three-way tensor from which HAL-style word space models can be derived. We use these models to identify semantic relations in a specialized corpus. Results suggest that the tensorbased methods we propose are more robust than the basic HAL model in some

متن کامل

Situation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia

Increasing migration is a vital concern for a globalizing sociocultural environment in today’s world. The UK and developed European countries have become an attractive destination for asylum seekers (labelled as “migrants”) in the past decade. The rapid rise in the number of asylum seekers, which was labelled “migration crisis” (Ruz, 2015), made this topic an integral part of scientific discuss...

متن کامل

Unsupervised induction of stochastic context-free grammars using distributional clustering

An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguisti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Using the Distributional Hypothesis to Derive Cooccurrence Scores from the British National Corpus

نویسنده

چکیده

منابع مشابه

Testing the Distributional Hypothesis 1 Running head: TESTING THE DISTRIBUTIONAL HYPOTHESIS Testing the Distributional Hypothesis: The Influence of Context on Judgments of Semantic Similarity

A Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts

Identifying semantic relations in a specialized corpus through distributional analysis of a cooccurrence tensor

Situation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia

Unsupervised induction of stochastic context-free grammars using distributional clustering

عنوان ژورنال:

اشتراک گذاری